Trace Ratio Criterion for Feature Selection
نویسندگان
چکیده
Fisher score and Laplacian score are two popular feature selection algorithms, both of which belong to the general graph-based feature selection framework. In this framework, a feature subset is selected based on the corresponding score (subset-level score), which is calculated in a trace ratio form. Since the number of all possible feature subsets is very huge, it is often prohibitively expensive in computational cost to search in a brute force manner for the feature subset with the maximum subset-level score. Instead of calculating the scores of all the feature subsets, traditional methods calculate the score for each feature, and then select the leading features based on the rank of these feature-level scores. However, selecting the feature subset based on the feature-level score cannot guarantee the optimum of the subset-level score. In this paper, we directly optimize the subset-level score, and propose a novel algorithm to efficiently find the global optimal feature subset such that the subset-level score is maximized. Extensive experiments demonstrate the effectiveness of our proposed algorithm in comparison with the traditional methods for feature selection. Introduction Many classification tasks often need to deal with highdimensional data. Data with a large number of features will result in higher computational cost, and the irrelevant and redundant features may also deteriorate the classification performance. Feature selection is one of the most important approaches for dealing with high-dimensional data (Guyon & Elisseeff 2003). According to the strategy of utilizing class label information, feature selection algorithms can be roughly divided into three categories, namely unsupervised feature selection (Dy & Brodley 2004), semisupervised feature selection (Zhao & Liu 2007a), and supervised feature selection (Robnik-Sikonja & Kononenko 2003). These feature selection algorithms can also be categorized into wrappers and filters (Kohavi & John 1997; Das 2001). Wrappers are classifier-specific and the feature subset is selected directly based on the performance of a specific classifier. Filters are classifier-independent and the Copyright c © 2008, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. feature subset is selected based on a well-defined criterion. Usually, wrappers could obtain better results than filters because wrappers are directly related to the algorithmic performance of a specific classifier. However, wrappers are computationally more expensive compared with filters and lack of good generalization capability over classifiers. Fisher score (Bishop 1995) and Laplacian score (He, Cai, & Niyogi 2005) are two popular filter-type methods for feature selection, and both belong to the general graph-based feature selection framework. In this framework, the feature subset is selected based on the score of the entire feature subset, and the score is calculated in a trace ratio form. The trace ratio form has been successfully used as a general criterion for feature extraction previously (Nie, Xiang, & Zhang 2007; Wang et al. 2007). However, when the trace ratio criterion is applied for feature selection, since the number of possible subsets of features is very huge, it is often prohibitively expensive in computational cost to search in a brute force manner for the feature subset with the maximum subset-level score. Therefore, instead of calculating the subset-level score for all the feature subsets, traditional methods calculate the score of each feature (feature-level score), and then select the leading features based on the rank of these feature-level scores. The selected subset of features based on the feature-level score is suboptimal, and cannot guarantee the optimum of the subset-level score. In this paper, we directly optimize the subset-level score, and propose a novel iterative algorithm to efficiently find the globally optimal feature subset such that the subset-level score is maximized. Experimental results on UCI datasets and two face datasets demonstrate the effectiveness of the proposed algorithm in comparison with the traditional methods for feature selection. Feature Selection ⊂ Subspace Learning Suppose the original high-dimensional data x ∈ R, that is, the number of features (dimensions) of the data is d. The task of subspace learning is to find the optimal projection matrix W ∈ R (usually m ≪ d) under an appropriate criterion, and then the d-dimensional data x is transformed to the m-dimensional data y by y = W x, (1) where W is a column-full-rank projection matrix. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008)
منابع مشابه
Efficient semi-supervised feature selection with noise insensitive trace ratio criterion
Feature selection is an effective method to deal with high-dimensional data. While in many applications such as multimedia and web mining, the data are often high-dimensional and very large scale, but the labeled data are often very limited. On these kind of applications, it is important that the feature selection algorithm is efficient and can explore labeled data and unlabeled data simultaneo...
متن کاملFuzzy-rough Information Gain Ratio Approach to Filter-wrapper Feature Selection
Feature selection for various applications has been carried out for many years in many different research areas. However, there is a trade-off between finding feature subsets with minimum length and increasing the classification accuracy. In this paper, a filter-wrapper feature selection approach based on fuzzy-rough gain ratio is proposed to tackle this problem. As a search strategy, a modifie...
متن کاملUnsupervised Feature Selection for Relation Extraction
This paper presents an unsupervised relation extraction algorithm, which induces relations between entity pairs by grouping them into a “natural” number of clusters based on the similarity of their contexts. Stability-based criterion is used to automatically estimate the number of clusters. For removing noisy feature words in clustering procedure, feature selection is conducted by optimizing a ...
متن کاملA New Framework for Distributed Multivariate Feature Selection
Feature selection is considered as an important issue in classification domain. Selecting a good feature through maximum relevance criterion to class label and minimum redundancy among features affect improving the classification accuracy. However, most current feature selection algorithms just work with the centralized methods. In this paper, we suggest a distributed version of the mRMR featu...
متن کاملSelection of Support Vector Kernel Parameters for Improved Generalization
The selection of kernel parameters is an open problem in the training of nonlinear support vector machines. The usual selection criterion is the quotient of the radius of the smallest sphere enclosing the training features and the margin width. Empirical studies on real-world data using Gaussian and polynomial kernels show that the test error due to this criterion is often much larger than the ...
متن کامل